Exploratory Data Analysis of Greenhouse Gas Emissions in Canada¶
Introduction¶
Greenhouse gasses (GHG) in the atmosphere have been suggested to cause environmental threats such as global warming, sea level rise, and increased extreme weather events (Conference Board of Canada, 2013). As of 2020, the GHG emissions of Canada were the 11th highest of any country (ECCC, 2023). Although Canada makes up approximately 0.48% of the global population, its GHG emissions account for 1.5% of total emissions worldwide (United Nations, 2023; Statistics Canada, 2023; ECCC, 2023). As part of the Paris Agreement in 2015, the Government of Canada (GoC) has committed to reduce GHG emissions by 30% in relation to its 2005 levels, by 2030 (ECCC, 2023). More recently, a 2023 GoC initiative has stated a higher target of 40-45% GHG reductions from 2005 levels by 2030 (ECCC, 2023). As the sources of Canadian GHG emissions are geographically varied and are associated with a range of activities/economic sectors, the path to reducing emissions will involve a study of these sources in the context of GHG reduction opportunities. By comparatively visualizing provincial per-capita GHG emission, historical sector-specific Canadian GHG emissions, and the relation between Canadian Gross Domestic Product (GDP) and sector-specific GHG emissions, this project aims to contribute to such a study. The goal of this project is to provide a clear and informative overview of GHG emissions in Canada, with a focus on quantifying the relationship between GHG emissions and economic activity.
Data Set¶
The datasets in this project will be pulled from publicly accessible government websites. As per the terms and conditions of both the Natural Resources Canada website and the Canada Energy Regulator website; content can be reproduced and used as long as it is for non-commercial purposes, without any additional permission required.
One of the datasets originates from the Canada Energy Regulator website; this showcases a map of Canada with a breakdown of energy emissions usage based on each province. The data will be pulled in a treemap format from the website and stored in a tabular format. Each column will indicate a year from 2010-2020 and each row will indicate an observation based on a single province/territory (C. E. R., 2023). The plan is to create two tabular datasets from this format; one of which will present Natural Gas usage based on each province and the second will present Crude Oil usage based on each province.
The next two datasets will be pulled from the Natural Resources Canada website. Both these datasets will have columns from 2000-2020. The first dataset will present ‘Canada’s GHG Emissions by Sector, End Use and Subsector’. The observations in the first dataset will be split up into the following categories: Residential, Commercial/Institutional, Industrial, Passenger Transportation, Freight Transportation, Off-Road and Agriculture. The values of each observation is listed in the Megatonne unit of measure (Natural Resources Canada, 2023). Both datasets will be presented in a tabular format. The second dataset consists of ‘Commodity Prices and Background Indicators’. As with the data pulled from the Canada Energy Regulator website, this data will also support the main topic in addressing GHG Emissions within Canada. The rows are subdivided into ‘Commodity Prices and Total GDP’ consisting of each sector (Natural Resources Canada, 2023).
The final dataset encompasses the population of Canada based on each province. The dataset, also presented in tabular format, shows the breakdown of population on a quarterly basis from 2000 through 2020, and each row is separated based on province or territory. This is directly pulled from the Statistics Canada website (Statistics Canada, 2023). One important thing to note is that these population values will be estimates.
Guiding Questions¶
Collectively, the guiding questions of this project are designed to depict the spatial-temporal trends of GHG emissions in Canada. The purpose of these depictions are to highlight potential opportunities for reductions in GHG emissions; per-capita representations of spatial GHG emissions are likely to be more relevant than absolute abundances. Absolute levels of GHG emissions may over-emphasize areas that have high GHG emissions due simply to a larger population burden. Similarly, the GHG emissions of a specific economic sector may be best represented relative to the economic contribution of the same sector (measured as a proportion of GDP). In this way, modeling per-capita provincial GHG emissions and sector-specific GHG emissions relative to economic output may reveal areas wherein GHG emissions have been proportionally reduced and, more importantly, areas of potential improvement.
How have GHG emissions relative to economic output changed over the past two decades in the major sectors of Canada’s economy?
As part of this question, the size of an economic sector will be compared to its GHG emissions. Visualizing this relationship should provide foundational insight into the sources of GHG emissions. Further insights can then be drawn by visualizing the trend in GHG emissions per dollar value of an economic sector. In theory, a decrease in the proportion of GHG emissions per dollar could signify that the sector is becoming more efficient, or ‘green’. In contrast, sectors where GHG emissions are increasing per dollar may warrant further investigation as candidates for GHG emission reduction. Although the relationship between productivity and sector-specific GHG emissions will be informative, the price of commodities may have an important impact on GHG emissions that is not fully captured by the GHG per dollar ratio. In particular, it is known that higher oil and gas prices enable unconventional oil and gas production methods to become economically viable (Helbling, 2013). Different production methods are likely to have different GHG emission profiles, which may have a material impact on the overall GHG emissions of a sector.
How have the per-capita GHG emissions of Canadian Provinces and Territories changed over the past two decades?
Each region plays a unique role in contributing to Canada’s emission profile, with varying levels of economic activity, energy resources, and climate policies. Understanding how GHG emissions over the past two decades have evolved at the provincial and territorial level provides insights into which regions are making progress in reducing emissions, which ones are maintaining emissions, and which ones require more targeted efforts. Insights can be drawn by visualizing trends in per capita emissions for each province and territory. To provide a comprehensive analysis, the potential impacts of climate policies will be considered in each region by identifying key climate policies and assessing their timelines. Understanding the per capita emissions by region may help us evaluate the equity of emission reductions, in that the burden of reducing emissions is equitably distributed among regions, considering their economic and energy profiles.
Data Wrangling and Visualization for Guiding Question 1¶
As our first guiding question looked at the economic impact of GHG emissions, we began by analyzing GHG emissions by sector and the commodity prices which provided GDP values for Canada's largest sectors. The initial dataset consisting of subsectors had duplicate titles for similar GHG end uses (e.g. Space Heating is found in both residential and commercial properties). To account for this, we decided to rename the respective rows with a hyphen to indicate which sector it falls under. We removed the subheaders with totals for each sector to focus our data on the end uses and to create a separate table for column totals (Ghg1 table).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#import both the GHG Emissions by Sector file and the Commodity Prices and Background Indicators file
Ghg = pd.read_excel("GHG Emissions by Sector.xlsx")
Ghg1 = pd.read_excel("GHG Emissions by Sector.xlsx")
CB = pd.read_excel("Commodity Prices and Background Indicators.xlsx")
CB1 = pd.read_excel("Commodity Prices and Background Indicators.xlsx")
#Ghg1 and CB1 will be used solely for the purpose of filtering and merging both of these dataframes
#The resulting dataframe is DF3
Ghg = Ghg.round(2) #Removed additional significant digits and limited it to 2 per value.
Ghg = Ghg.rename(columns={'Unnamed: 0': 'Source of GHG'}) #Added a title for column indicating each source
#Changed the names of values that had duplicate entries to better reflect the source of GHG.
Ghg.loc[2, 'Source of GHG'] = 'Space Heating - Residential'
Ghg.loc[3, 'Source of GHG'] = 'Water Heating - Residential'
Ghg.loc[7, 'Source of GHG'] = 'Lighting - Residential'
Ghg.loc[8, 'Source of GHG'] = 'Space Cooling - Residential'
Ghg.loc[10, 'Source of GHG'] = 'Space Heating - Commercial'
Ghg.loc[11, 'Source of GHG'] = 'Water Heating - Commercial'
Ghg.loc[14, 'Source of GHG'] = 'Lighting - Commercial'
Ghg.loc[15, 'Source of GHG'] = 'Space Cooling - Commercial'
Ghg.loc[31, 'Source of GHG'] = 'Light Trucks - Passenger'
Ghg.loc[34, 'Source of GHG'] = 'Air - Passenger'
Ghg.loc[35, 'Source of GHG'] = 'Rail - Passenger'
Ghg.loc[37, 'Source of GHG'] = 'Light Trucks - Freight'
Ghg.loc[40, 'Source of GHG'] = 'Air - Freight'
Ghg.loc[41, 'Source of GHG'] = 'Rail - Freight'
#removed subheadings which had total values of GHG per sector to better focus our dataset.
#Ghg.drop([0, 1, 9, 17, 28, 29, 36], axis=0, inplace=True)
The second dataset for our first guiding question entailed GDP generated by the biggest sectors in Canada on a yearly basis, namely Industrial, Commercial and Agriculture. The dataset also included GDP through electricity generation. For the purpose of our analysis, we focused on the 3 main sectors. Later on, we will be combining these two datasets (GHG Emissions by Sector and GDP by Sector).
#dropped initial row which showed total GHG Emissions to focus on each sector specifically.
CB.drop([0], axis=0, inplace=True)
CB = CB.round(2)
CB = CB.rename(columns={'Unnamed: 0': 'Source of GHG'})
CB
| Source of GHG | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Industrial | 407435.20 | 404893.90 | 414638.90 | 419498.50 | 433942.00 | 445641.20 | 449969.50 | 448495.90 | 437702.20 | ... | 437861.50 | 447856.50 | 459845.50 | 479792.90 | 473645.10 | 464860.20 | 486485.20 | 503277.60 | 501449.5 | 467445.0 |
| 2 | Commercial/Institutional | 823297.00 | 849552.00 | 882345.00 | 903023.00 | 932062.00 | 962288.00 | 1000647.00 | 1032522.00 | 1051936.00 | ... | 1105968.00 | 1126960.00 | 1152136.00 | 1179705.00 | 1196301.00 | 1218749.00 | 1252804.00 | 1289520.00 | 1323236.0 | 1273834.0 |
| 3 | Agriculture | 21276.05 | 21233.06 | 20139.70 | 22995.43 | 25620.29 | 25919.98 | 25881.79 | 24518.72 | 26710.91 | ... | 25804.04 | 26274.75 | 31980.63 | 28653.57 | 30322.48 | 32602.95 | 33461.52 | 33133.78 | 35179.4 | 37025.0 |
| 4 | Electricity Generation | 26218.92 | 25645.92 | 26979.08 | 27557.83 | 27696.92 | 29514.00 | 29096.17 | 30849.00 | 32367.00 | ... | 31475.92 | 30627.00 | 30962.00 | 31990.00 | 32397.00 | 33540.00 | 33989.00 | 34544.00 | 34824.0 | 34176.0 |
4 rows × 22 columns
The next task was to take our 3 main sectors - Industrial, Commercial, Agriculture and combine the amount of GHG emissions that were emitted by each sector and compare that to the total GDP of that given year. We will need the row of totals from the GDP dataset, as well as the sector totals for each year which we had previously removed from the Ghg dataframe.
#this will pull the rows consisting of totals for each sector from our Ghg dataset.
Ghg1 = Ghg1.iloc[[9, 17, 44]]
Ghg1 = Ghg1.rename(columns={'Unnamed: 0': 'Source of GHG'})
Ghg1.loc['Total'] = Ghg1.sum(numeric_only=True)
#we also created an additional row which sums the total GHG per year.
Ghg1.loc['Total', 'Source of GHG'] = 'Total GHG per Year'
Ghg1
| Source of GHG | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | Commercial/Institutional (Mt of CO2e) | 55.105833 | 56.536416 | 59.064334 | 62.027890 | 59.021386 | 55.983739 | 52.163896 | 54.511811 | 53.391508 | ... | 50.091385 | 46.805387 | 47.566451 | 49.606041 | 49.399903 | 49.398433 | 52.325620 | 52.259235 | 54.232999 | 50.498562 |
| 17 | Industrial (Mt of CO2e) | 160.355409 | 158.726351 | 159.201689 | 165.741093 | 168.367031 | 164.390842 | 164.723550 | 174.256674 | 165.301070 | ... | 168.166005 | 168.377670 | 166.845965 | 167.363194 | 171.675708 | 166.879338 | 171.757890 | 179.198073 | 181.695153 | 165.606838 |
| 44 | Agriculture (Mt of CO2e) | 15.602440 | 15.001209 | 14.327296 | 14.981179 | 14.866488 | 15.129819 | 15.242616 | 15.702339 | 15.764766 | ... | 16.938284 | 16.471504 | 17.153562 | 17.757157 | 18.346355 | 18.554234 | 18.988716 | 19.568804 | 18.843595 | 17.999334 |
| Total | Total GHG per Year | 231.063683 | 230.263976 | 232.593319 | 242.750162 | 242.254905 | 235.504400 | 232.130062 | 244.470823 | 234.457344 | ... | 235.195673 | 231.654562 | 231.565977 | 234.726392 | 239.421967 | 234.832005 | 243.072227 | 251.026112 | 254.771747 | 234.104735 |
4 rows × 22 columns
#For the purpose of our analysis we removed the electricity generation row and focused on Industrial, Commercial, Agriculture GDP values.
CB1.drop([0], axis=0, inplace=True)
CB1.drop([4], axis=0, inplace=True)
CB1 = CB1.rename(columns={'Unnamed: 0': 'GDP Generated by Sector'})
CB1.loc['Total'] = CB1.sum(numeric_only=True)
#we also created an additional row which sums the total GDP generated each year
CB1.loc['Total', 'GDP Generated by Sector'] = 'Total GDP Per Year'
CB1 = CB1.round(2)
CB1
| GDP Generated by Sector | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Industrial | 407435.20 | 404893.90 | 414638.9 | 419498.50 | 433942.00 | 445641.20 | 449969.50 | 448495.90 | 437702.20 | ... | 437861.50 | 447856.50 | 459845.50 | 479792.90 | 473645.10 | 464860.20 | 486485.20 | 503277.60 | 501449.5 | 467445.0 |
| 2 | Commercial/Institutional | 823297.00 | 849552.00 | 882345.0 | 903023.00 | 932062.00 | 962288.00 | 1000647.00 | 1032522.00 | 1051936.00 | ... | 1105968.00 | 1126960.00 | 1152136.00 | 1179705.00 | 1196301.00 | 1218749.00 | 1252804.00 | 1289520.00 | 1323236.0 | 1273834.0 |
| 3 | Agriculture | 21276.05 | 21233.06 | 20139.7 | 22995.43 | 25620.29 | 25919.98 | 25881.79 | 24518.72 | 26710.91 | ... | 25804.04 | 26274.75 | 31980.63 | 28653.57 | 30322.48 | 32602.95 | 33461.52 | 33133.78 | 35179.4 | 37025.0 |
| Total | Total GDP Per Year | 1252008.25 | 1275678.96 | 1317123.6 | 1345516.93 | 1391624.29 | 1433849.18 | 1476498.29 | 1505536.62 | 1516349.11 | ... | 1569633.54 | 1601091.25 | 1643962.13 | 1688151.47 | 1700268.58 | 1716212.15 | 1772750.72 | 1825931.38 | 1859864.9 | 1778304.0 |
4 rows × 22 columns
Finally, we merged the two dataframes (one consisting of the total GDP generated by sector per year and the second dataset which shows the total GHG emitted by sector per year). Recall, we focused on three main sectors (Industrial, Commercial and Agriculture). Once merged, we proceeded to transpose the data to have each year show by row value as the index and each respective column indicate the sector. The last two columns showcase the total GHG per Year and total GDP per Year.
With both dataframes merged together, this makes it easier to compare and visualize our dataset to address our guiding question.
DF3 = Ghg1.merge(CB1, how='outer')
DF3 = DF3.transpose()
DF3 = DF3.drop(DF3.columns[[4, 5, 6]], axis=1)
DF3 = DF3.drop(labels=["GDP Generated by Sector"], axis=0, inplace=False)
DF3.columns = DF3.iloc[0]
DF3 = DF3[1:]
DF3
DF3.columns = ['Commercial/Institutional (Mt of CO2e)', 'Industrial (Mt of CO2e)','Agriculture (Mt of CO2e)','Total GHG per Year', 'Total GDP Per Year']
DF3
| Commercial/Institutional (Mt of CO2e) | Industrial (Mt of CO2e) | Agriculture (Mt of CO2e) | Total GHG per Year | Total GDP Per Year | |
|---|---|---|---|---|---|
| 2000 | 55.105833 | 160.355409 | 15.60244 | 231.063683 | 1252008.25 |
| 2001 | 56.536416 | 158.726351 | 15.001209 | 230.263976 | 1275678.96 |
| 2002 | 59.064334 | 159.201689 | 14.327296 | 232.593319 | 1317123.6 |
| 2003 | 62.02789 | 165.741093 | 14.981179 | 242.750162 | 1345516.93 |
| 2004 | 59.021386 | 168.367031 | 14.866488 | 242.254905 | 1391624.29 |
| 2005 | 55.983739 | 164.390842 | 15.129819 | 235.5044 | 1433849.18 |
| 2006 | 52.163896 | 164.72355 | 15.242616 | 232.130062 | 1476498.29 |
| 2007 | 54.511811 | 174.256674 | 15.702339 | 244.470823 | 1505536.62 |
| 2008 | 53.391508 | 165.30107 | 15.764766 | 234.457344 | 1516349.11 |
| 2009 | 50.680115 | 153.227149 | 13.307905 | 217.215169 | 1468674.79 |
| 2010 | 50.474459 | 162.094963 | 15.63274 | 228.202162 | 1520466.55 |
| 2011 | 50.091385 | 168.166005 | 16.938284 | 235.195673 | 1569633.54 |
| 2012 | 46.805387 | 168.37767 | 16.471504 | 231.654562 | 1601091.25 |
| 2013 | 47.566451 | 166.845965 | 17.153562 | 231.565977 | 1643962.13 |
| 2014 | 49.606041 | 167.363194 | 17.757157 | 234.726392 | 1688151.47 |
| 2015 | 49.399903 | 171.675708 | 18.346355 | 239.421967 | 1700268.58 |
| 2016 | 49.398433 | 166.879338 | 18.554234 | 234.832005 | 1716212.15 |
| 2017 | 52.32562 | 171.75789 | 18.988716 | 243.072227 | 1772750.72 |
| 2018 | 52.259235 | 179.198073 | 19.568804 | 251.026112 | 1825931.38 |
| 2019 | 54.232999 | 181.695153 | 18.843595 | 254.771747 | 1859864.9 |
| 2020 | 50.498562 | 165.606838 | 17.999334 | 234.104735 | 1778304.0 |
To take our data visualization skills further, we were able to take our merged dataframe (DF3) and create a visual dashboard. Through the use of hvplot in Pandas and the panel library in Python, we can seamlessly create a dashboard which users can toggle between different sectors and generate a quick line and bar plot showcasing GHG emissions by sector as well as total GHG and GDP generated on a yearly basis.
import pandas as pd
import numpy as np
import panel as pn
pn.extension("tabulator")
import hvplot.pandas
The following code below will generate an interactive dataframe using our DF3 dataframe. This will generate a table with the year column and one of the 5 columns from the DF3 dataframe along with a toggle to switch between the different columns.
DF3 = DF3.reset_index()
iDF3 = DF3.interactive()
yaxis_ghg = pn.widgets.RadioButtonGroup(
name='Y axis',
options=['Commercial/Institutional (Mt of CO2e)', 'Industrial (Mt of CO2e)', 'Agriculture (Mt of CO2e)','Total GHG per Year','Total GDP Per Year'],
button_type='success'
)
ghg_pipeline = (
iDF3
.groupby(['index'])[yaxis_ghg].mean()
.to_frame()
.reset_index()
.sort_values(by='index')
.reset_index(drop=True)
)
The code below will use the interactive dataframe we created and generate a line plot and bar plot that can toggle between the various columns. The x-axis will contain the years from our first columns (2000-2020).
co2_plot = ghg_pipeline.hvplot(x = 'index', y=yaxis_ghg,line_width=4, title="GHG emissions by Sector & Totals")
ghg_bar_plot = ghg_pipeline.hvplot(kind='bar', x='index', y=yaxis_ghg, title='GHG Emissions by Usage and Total GDP')
Using the FastListTemplate from the Panel library, we are able to use both plots created and implement it into a dashboard.
template = pn.template.FastListTemplate(
title='GHG Emissions across Canada',
sidebar=[pn.pane.Markdown("# GHG Emissions and their impact on GDP")],
main=[pn.Column(yaxis_ghg, co2_plot.panel(width=700), margin=(0,25)),
pn.Column(yaxis_ghg, ghg_bar_plot.panel(width=800))],
accent_base_color="#eab676",
header_background="#1e81b0",
)
template.show()
template.servable();
Launching server at http://localhost:57657
C:\Users\hurdg\anaconda3\envs\geo_env\Lib\site-packages\holoviews\core\data\pandas.py:39: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` return dataset.data.dtypes[idx].type C:\Users\hurdg\anaconda3\envs\geo_env\Lib\site-packages\holoviews\core\data\pandas.py:39: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` return dataset.data.dtypes[idx].type C:\Users\hurdg\anaconda3\envs\geo_env\Lib\site-packages\holoviews\core\data\pandas.py:39: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` return dataset.data.dtypes[idx].type WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: Column(id='f3098231-a6f5-43ad-bfe4-6f651a0fd721', ...) WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: Column(id='e190af2e-6838-47c2-b3aa-e1dd3401e0cf', ...)
We will transpose the first data table and call it , by this step we make it easy to use the choosen Points and to name it in propper way in the graphs
G=Ghg.transpose()
G.columns = G.iloc[0]
G=G.iloc[1:]
G = G.rename(columns={'Unnamed: 0': 'Year'})
G=G.round(2 )
G
| Source of GHG | Total GHG Emissions Including Electricity (Mt of CO2e) | Residential (Mt of CO2e) | Space Heating - Residential | Water Heating - Residential | Appliances | Major Appliances | Other Appliances | Lighting - Residential | Space Cooling - Residential | Commercial/Institutional (Mt of CO2e) | ... | Rail - Passenger | Freight Transportation (Mt of CO2e) | Light Trucks - Freight | Medium Trucks | Heavy Trucks | Air - Freight | Rail - Freight | Marine | Off-Road (Mt of CO2e) | Agriculture (Mt of CO2e) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 465.71 | 74.53 | 46.18 | 13.84 | 10.46 | 7.8 | 2.66 | 3.28 | 0.78 | 55.11 | ... | 0.23 | 64.81 | 10.24 | 10.77 | 28.97 | 0.56 | 6.36 | 7.92 | 5.89 | 15.6 |
| 2001 | 463.93 | 74.14 | 43.3 | 14.32 | 11.49 | 8.43 | 3.06 | 3.66 | 1.39 | 56.54 | ... | 0.23 | 65.24 | 10.38 | 12.35 | 27.18 | 0.47 | 6.31 | 8.55 | 6.56 | 15.0 |
| 2002 | 469.76 | 75.76 | 45.46 | 14.22 | 10.95 | 7.88 | 3.07 | 3.49 | 1.63 | 59.06 | ... | 0.21 | 64.0 | 10.77 | 11.94 | 26.84 | 0.51 | 5.74 | 8.2 | 6.73 | 14.33 |
| 2003 | 487.54 | 78.59 | 47.5 | 14.58 | 11.47 | 8.11 | 3.36 | 3.73 | 1.31 | 62.03 | ... | 0.2 | 68.47 | 10.93 | 14.04 | 28.93 | 0.47 | 5.81 | 8.3 | 6.81 | 14.98 |
| 2004 | 491.29 | 77.22 | 47.2 | 14.39 | 11.1 | 7.7 | 3.39 | 3.57 | 0.96 | 59.02 | ... | 0.2 | 72.57 | 11.1 | 15.58 | 30.23 | 0.5 | 5.98 | 9.18 | 6.97 | 14.87 |
| 2005 | 485.46 | 75.06 | 45.34 | 14.23 | 10.38 | 7.1 | 3.27 | 3.28 | 1.84 | 55.98 | ... | 0.21 | 74.26 | 11.25 | 14.53 | 32.15 | 0.54 | 6.37 | 9.41 | 7.12 | 15.13 |
| 2006 | 476.56 | 71.5 | 42.57 | 14.06 | 10.27 | 6.91 | 3.36 | 3.18 | 1.41 | 52.16 | ... | 0.21 | 74.44 | 11.08 | 16.81 | 31.08 | 0.5 | 6.68 | 8.29 | 7.21 | 15.24 |
| 2007 | 502.37 | 78.11 | 48.39 | 14.5 | 10.54 | 6.97 | 3.58 | 3.19 | 1.48 | 54.51 | ... | 0.22 | 78.3 | 11.72 | 17.28 | 32.48 | 0.41 | 7.16 | 9.25 | 7.31 | 15.7 |
| 2008 | 488.5 | 75.67 | 47.83 | 13.84 | 9.98 | 6.45 | 3.52 | 3.0 | 1.02 | 53.39 | ... | 0.25 | 79.4 | 11.61 | 18.22 | 32.68 | 0.34 | 7.54 | 9.0 | 7.39 | 15.76 |
| 2009 | 465.27 | 71.51 | 45.72 | 13.03 | 9.23 | 5.85 | 3.38 | 2.76 | 0.77 | 50.68 | ... | 0.23 | 78.8 | 11.77 | 19.44 | 32.15 | 0.31 | 6.42 | 8.71 | 7.35 | 13.31 |
| 2010 | 479.88 | 69.64 | 42.7 | 13.03 | 9.53 | 5.86 | 3.67 | 2.85 | 1.53 | 50.47 | ... | 0.19 | 82.93 | 12.15 | 21.8 | 33.26 | 0.37 | 6.34 | 9.02 | 7.4 | 15.63 |
| 2011 | 486.93 | 69.93 | 44.44 | 13.17 | 8.49 | 5.13 | 3.36 | 2.48 | 1.35 | 50.09 | ... | 0.22 | 83.23 | 12.16 | 21.36 | 34.96 | 0.38 | 7.14 | 7.23 | 7.52 | 16.94 |
| 2012 | 479.66 | 64.79 | 40.23 | 12.65 | 8.17 | 4.82 | 3.35 | 2.33 | 1.41 | 46.81 | ... | 0.19 | 83.34 | 12.45 | 21.15 | 35.03 | 0.42 | 7.35 | 6.95 | 7.62 | 16.47 |
| 2013 | 484.53 | 66.42 | 42.93 | 12.36 | 7.98 | 4.62 | 3.35 | 2.21 | 0.94 | 47.57 | ... | 0.16 | 84.53 | 12.96 | 22.13 | 35.36 | 0.44 | 7.09 | 6.55 | 7.76 | 17.15 |
| 2014 | 484.33 | 66.36 | 43.75 | 11.77 | 7.86 | 4.48 | 3.38 | 2.14 | 0.84 | 49.61 | ... | 0.15 | 83.68 | 12.98 | 21.54 | 35.57 | 0.43 | 7.28 | 5.88 | 7.93 | 17.76 |
| 2015 | 487.94 | 66.3 | 42.75 | 12.12 | 8.14 | 4.58 | 3.56 | 2.17 | 1.12 | 49.4 | ... | 0.16 | 80.45 | 13.55 | 20.62 | 33.6 | 0.39 | 6.92 | 5.36 | 8.13 | 18.35 |
| 2016 | 477.75 | 61.05 | 39.03 | 11.39 | 7.39 | 4.12 | 3.27 | 1.93 | 1.32 | 49.4 | ... | 0.16 | 77.47 | 14.54 | 20.03 | 31.43 | 0.41 | 6.35 | 4.71 | 8.33 | 18.55 |
| 2017 | 493.83 | 63.06 | 41.09 | 11.87 | 7.26 | 4.02 | 3.25 | 1.87 | 0.97 | 52.33 | ... | 0.18 | 81.71 | 15.07 | 20.55 | 33.24 | 0.49 | 7.27 | 5.09 | 8.46 | 18.99 |
| 2018 | 511.12 | 65.04 | 43.35 | 12.17 | 6.67 | 3.67 | 2.99 | 1.67 | 1.19 | 52.26 | ... | 0.17 | 85.34 | 15.68 | 21.64 | 34.76 | 0.55 | 7.43 | 5.28 | 8.64 | 19.57 |
| 2019 | 513.54 | 61.5 | 41.18 | 11.26 | 6.63 | 3.59 | 3.04 | 1.63 | 0.79 | 54.23 | ... | 0.18 | 86.28 | 16.21 | 21.29 | 34.73 | 0.53 | 7.48 | 6.04 | 8.77 | 18.84 |
| 2020 | 454.5 | 57.1 | 36.79 | 11.4 | 6.32 | 3.41 | 2.9 | 1.55 | 1.05 | 50.5 | ... | 0.09 | 78.18 | 14.01 | 18.73 | 32.34 | 0.65 | 7.06 | 5.39 | 8.82 | 18.0 |
21 rows × 45 columns
Now we take the Total values of GHG emmissions for each section to compare the GDP progress with the GHG emmissons to find the relation between them
t=Ghg.iloc[[0, 1, 9, 17, 28, 29, 36]]
total=t.transpose()
total.columns = total.iloc[0]
total=total.iloc[1:]
total = total.rename(columns={'Unnamed: 0': 'Year'})
total=total.round(2 )
total.index = total.index.astype(str)
total
| Source of GHG | Total GHG Emissions Including Electricity (Mt of CO2e) | Residential (Mt of CO2e) | Commercial/Institutional (Mt of CO2e) | Industrial (Mt of CO2e) | Total Transportation (Mt of CO2e) | Passenger Transportation (Mt of CO2e) | Freight Transportation (Mt of CO2e) |
|---|---|---|---|---|---|---|---|
| 2000 | 465.71 | 74.53 | 55.11 | 160.36 | 160.11 | 89.41 | 64.81 |
| 2001 | 463.93 | 74.14 | 56.54 | 158.73 | 159.52 | 87.72 | 65.24 |
| 2002 | 469.76 | 75.76 | 59.06 | 159.2 | 161.41 | 90.68 | 64.0 |
| 2003 | 487.54 | 78.59 | 62.03 | 165.74 | 166.21 | 90.92 | 68.47 |
| 2004 | 491.29 | 77.22 | 59.02 | 168.37 | 171.82 | 92.28 | 72.57 |
| 2005 | 485.46 | 75.06 | 55.98 | 164.39 | 174.9 | 93.51 | 74.26 |
| 2006 | 476.56 | 71.5 | 52.16 | 164.72 | 172.93 | 91.28 | 74.44 |
| 2007 | 502.37 | 78.11 | 54.51 | 174.26 | 179.79 | 94.18 | 78.3 |
| 2008 | 488.5 | 75.67 | 53.39 | 165.3 | 178.37 | 91.57 | 79.4 |
| 2009 | 465.27 | 71.51 | 50.68 | 153.23 | 176.54 | 90.38 | 78.8 |
| 2010 | 479.88 | 69.64 | 50.47 | 162.09 | 182.04 | 91.71 | 82.93 |
| 2011 | 486.93 | 69.93 | 50.09 | 168.17 | 181.81 | 91.06 | 83.23 |
| 2012 | 479.66 | 64.79 | 46.81 | 168.38 | 183.22 | 92.26 | 83.34 |
| 2013 | 484.53 | 66.42 | 47.57 | 166.85 | 186.54 | 94.25 | 84.53 |
| 2014 | 484.33 | 66.36 | 49.61 | 167.36 | 183.24 | 91.63 | 83.68 |
| 2015 | 487.94 | 66.3 | 49.4 | 171.68 | 182.22 | 93.65 | 80.45 |
| 2016 | 477.75 | 61.05 | 49.4 | 166.88 | 181.87 | 96.07 | 77.47 |
| 2017 | 493.83 | 63.06 | 52.33 | 171.76 | 187.69 | 97.53 | 81.71 |
| 2018 | 511.12 | 65.04 | 52.26 | 179.2 | 195.05 | 101.06 | 85.34 |
| 2019 | 513.54 | 61.5 | 54.23 | 181.7 | 197.27 | 102.22 | 86.28 |
| 2020 | 454.5 | 57.1 | 50.5 | 165.61 | 163.29 | 76.29 | 78.18 |
from pydoc import locate
from ipywidgets import Label
plt.figure(figsize=(8, 6))
plt.plot(total.index,total['Residential (Mt of CO2e)'], marker='o',label='Residential ')
plt.plot(total.index,total['Commercial/Institutional (Mt of CO2e)'], marker='o',label='Commercial/Institutional')
plt.plot(total.index,total['Industrial (Mt of CO2e)'], marker='o',label='Industrial')
plt.plot(total.index,total['Passenger Transportation (Mt of CO2e)'], marker='o',label='Passenger Transportation')
plt.plot(total.index,total['Freight Transportation (Mt of CO2e)'], marker='o',label='Freight Transportation')
plt.xlabel('Year')
plt.ylabel('Emissions (Mt of CO2e)')
plt.title('Total GHG Emissions Over the Years')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
By taking A closer Look at the second data set to understand the relation Between GDP and GHG in Canada
BC=CB.transpose()
BC.columns = BC.iloc[0]
BC=BC.iloc[1:]
BC = BC.rename(columns={'Unnamed: 0': 'Year'})
BC=BC.round(2 )
plt.figure(figsize=(10, 6))
plt.plot(BC.index, BC['Industrial'], label='Industrial')
plt.plot(BC.index, BC['Commercial/Institutional'], label='Commercial/Institutional')
plt.plot(BC.index, BC['Agriculture '], label='Agriculture')
plt.xlabel('Years')
plt.ylabel('Million')
plt.title('Total GDP Millions Over Years')
plt.legend()
plt.grid(True)
plt.show()
By looking at the last two paragraphs whe can see that the Commercial domain is making a big fast progress in the income for the 20 years of our study and the industrial and the agriculture is having almost a stedy line but in the other side with all the fast progress the comercial domain is having the losest numbers of the other sections will the industrial department is dominating all other sections with the higest amouts of gas emmissions and that give us a big note to controle this emmissions
ofcource we are need to find the biggest factor for CO2 in each sector The first Data Visualising is the Freight Transportation we Choose the Columns from from G
Freight_Transportation=G[[ 'Light Trucks - Freight',' Medium Trucks', ' Heavy Trucks',
'Air - Freight', 'Rail - Freight', ' Marine']]
ft=Freight_Transportation.transpose()
c=ft.sum(axis=1)
plt.figure(figsize=(7, 7))
plt.pie(c,labels=c.index,wedgeprops=dict(width=0.4),autopct='%1.1f%%')
plt.title("Freight Transportation Total by Category")
Text(0.5, 1.0, 'Freight Transportation Total by Category')
we found that the Heavy Trucks is causing the highest amount of CO2 and then Medium Truck Now to create the Secont Graph we did the same steps but we choose the Pie chart while the first one was donat chart
Passenger_Transportation=G[[ 'Cars',
'Light Trucks - Passenger', 'Buses', 'Air - Passenger',
'Rail - Passenger']]
PT=Passenger_Transportation.transpose()
pt=PT.sum(axis=1)
plt.figure(figsize=(8, 8))
plt.pie(pt,autopct='%1.1f%%')
plt.title("Passenger Transportation Total by Category")
plt.legend(pt.index,loc='upper right')
<matplotlib.legend.Legend at 0x25535568750>
from the chart above we can see that from the Total Passenger transportation the Cars is causing 42.5% of All ,this resault should encourage us to take a step to reduce this amount of gas emmissions as each one of us is responsable in this section to check on the Industry Gas Emmissions we sellected all needed columns from G and used Bar plot to view the Result
Industrial=G[[ 'Mining , Quarrying, and Oil and Gas Extraction', 'Pulp and Paper',
'Iron and Steel', 'Smelting and Refining', 'Cement', 'Chemicals',
'Petroleum Refining', 'Other Manufacturing', 'Forestry ',
'Construction']]
I=Industrial.transpose()
i=I.sum(axis=1)
i.plot(kind='bar', stacked=True,color=(['#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a']))
plt.title("industrial Total by Category")
plt.ylabel('Emissions (Mt of CO2e)')
Text(0, 0.5, 'Emissions (Mt of CO2e)')
we can see that the Mining ,and Gas Extraction is waving a red flag for the invernement with aount of CO2 ,Now to make a closer look at Residential and Commercial Secions we used Seaborn tho plot the next graphs
Residential=G[['Space Heating - Residential','Space Cooling - Residential','Water Heating - Residential','Lighting - Residential']
]
g = sns.catplot(
data=Residential,
kind="bar",
alpha=.6, height=6)
g.set_xticklabels(['Space Heating ','Space Cooling ',
'Water Heating ', 'Lighting '])
plt.ylabel('Emissions (Mt of CO2e)')
g.set_axis_labels("Sub Category", "Emissions (Mt of CO2e)")
plt.title("Residential Total by Category")
Text(0.5, 1.0, 'Residential Total by Category')
Commercial=G[[ 'Space Heating - Commercial', 'Water Heating - Commercial',
'Auxiliary Equipment', 'Auxiliary Motors', 'Lighting - Commercial',
'Space Cooling - Commercial', 'Street Lighting']]
sns.set_theme(style="ticks", palette="pastel")
k=sns.boxplot(
palette=["m", "g","y","b","r"],
data=Commercial)
k.set_xticklabels(['S-Heat', 'W-Heat ',
'Equipment', 'Motors', 'Lighting ',
'Cooling ', 'St-Lighting'])
sns.despine(offset=10, trim=True)
C:\Users\hurdg\AppData\Local\Temp\ipykernel_15172\2750093320.py:5: UserWarning: The palette list has fewer values (5) than needed (7) and will cycle, which may produce an uninterpretable plot. k=sns.boxplot( C:\Users\hurdg\AppData\Local\Temp\ipykernel_15172\2750093320.py:8: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator. k.set_xticklabels(['S-Heat', 'W-Heat ',
It is not surpricing that the Place heating is producing the Highest numbers Gas Emmisions and we need to be responsable in our houses and work places to limit the emmssions as much as we can
Data Wrangling and Visualization for Guiding Question 2¶
To answer our second guiding question, we first aimed to identify the regions making the most significant contributions to total GHG emissions. We approached this by creating line plots to visualize the emission trends from 2000 to 2021. To do this, we defined a list of regions based on the last characters of our Excel files and used a for loop, along with an f string, to read these files. We filtered each file by a specific column and row using the loc function, transposed the data frames, and combined them. We then customized the column names and generated plots using Plotly.
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# Define a list of Provinces and Territories
regions = ["AB", "BC", "MB","NB","NL","NS","NT","NU","ON","PE","QC","SK","YT"]
# Initialize an empty DataFrame to store the combined data
combined_data = pd.DataFrame()
# Loop through each Province and Territory excel file and select the total GHG column
for region in regions:
# Define the file name based on the region code from above
file_name = f"EN_GHG_IPCC_{region}.xlsx"
# Read the Excel file, and select the Summary sheet
df = pd.read_excel(file_name, sheet_name='Summary')
# Select rows and columns containing total GHG
df = df.loc[[3, 5], 'Unnamed: 15':]
# Rename rows
df = df.rename(index={3: 'Year', 5: f'ghg_{region}'})
# Transpose the DataFrame
df = df.transpose()
# Combine data for this region with the combined_data DataFrame
if combined_data.empty:
combined_data = df
else:
combined_data = pd.merge(combined_data, df, on="Year", how="outer")
# Define a list of appropriate column names
column_names = ['Year','Alberta',"British Columbia","Manitoba","New Brunswick","Newfoundland & Labrador","Nova Scotia","Northwest Territories","Nunavut","Ontario", "PE Island","Quebec","Saskatchewan","Yukon"]
# Set the column names of the DataFrame
combined_data.columns = column_names
# Create a line plot
fig = px.line(
combined_data,
x="Year",
y=combined_data.columns[1:],
color_discrete_sequence=px.colors.qualitative.Light24,
title="Figure 1b. Total Greenhouse Gas Emissions by Province and Territory in Canada (2000-2021)")
fig.update_yaxes(title_text="GHG Emissions (kt CO_2 eq)")
fig.update_yaxes(dtick=20000)
fig.update_layout(
width=850, #Adjust the width
height=750# Adjust the height
)
#Removing legend title
fig.update_layout(
legend_title_text="")
fig.update_layout(
xaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
plot_bgcolor='white', # Set background color
legend=dict(orientation='h', yanchor='bottom', y=-0.2, xanchor='right', x=1)
)
#Allows for zooming into axis
fig.update_xaxes(automargin=True)
fig.update_yaxes(automargin=True)
fig.show()
For a different perspective on major contributors, we also created a pie chart based on the filtered data for the year 2019.
#extracting ghg values from dataframe combined_data
values = combined_data.loc[19, 'Alberta':'Yukon']
#Pie chart
fig = px.pie(values=values, names=values.index)
# Customize the layout
fig.update_layout(
title='Figure 2b. Pie Chart of GHG Emissions for the Provinces and Territories of Canada ',
)
# Show the chart
fig.show()
GHG Per capita figure generation. In a separate section of code, we processed a population CSV file. After removing missing values and resetting the index, we filtered the data to retain only Q1 data for each year. We used regular expressions to extract the four-year digit and create a new column called "Year." Similarly, we did this for the two-character quarter data and placed it in another new column. We then filtered the data to include only rows corresponding to Q1. Following this, we sorted and reordered the columns to achieve a specific structure. From this refined DataFrame, named df_pop, we further filtered the data to obtain the 2019 population statistics. We followed a similar process for GHG data for the year 2019. Finally, we divided the GHG column by the population column and created a bar graph using Matplotlib. This bar plot visualizes the relationship between per capita GHG emissions for each province and territory for the year 2019 to facilitate comparative analysis.
import pandas as pd
import matplotlib.pyplot as plt
#Reading csv file
df_pop = pd.read_csv('Population.csv', encoding='latin-1').dropna(axis=0)
#Make row 0 as header
df_pop.columns = df_pop.iloc[0]
# Reset the index to exclude the first row
df_pop = df_pop[1:].reset_index(drop=True)
df_pop['Year'] = df_pop['Geography'].str.extract(r'(\d{4})')[0]
df_pop['Quarter'] = df_pop['Geography'].str.extract(r'(Q\d)')[0]
# Filter to show only Q1 for each year
df_pop = df_pop[df_pop['Quarter'] == 'Q1']
# Drop the 'Quarter' column if you no longer need it
df_pop = df_pop.drop(columns=['Quarter'])
#rest index
df_pop = df_pop[1:].reset_index(drop=True)
#sort columns by alphabebetical order
df_pop = df_pop[sorted(df_pop.columns)]
df_pop = df_pop[['Geography', 'Year', 'Canada'] + [col for col in df_pop.columns if col not in ['Geography', 'Year', 'Canada']]]
#For population data
# Extract the columns
columns_to_sort = df_pop.loc[:, 'Alberta':'Yukon']
# Sort the columns alphabetically
sorted_columns = sorted(columns_to_sort.columns)
# Reassign the sorted columns to the DataFrame
df_pops = df_pop[sorted_columns]
#Renaming the columns to match ghg columns
column_names_pop = ["Alberta", "British Columbia", "Manitoba", "New Brunswick", "Newfoundland & Labrador","Northwest Territories", "Nova Scotia", "Nunavut", "Ontario", "PE Island", "Quebec", "Saskatchewan", "Yukon"]
# Set the column names of the DataFrame
df_pops.columns = column_names_pop
row_2019_pop = df_pops.loc[20, :]
#For GHG data
# Extracting the columns
column_sorting_ghg = combined_data.loc[:, 'Alberta':'Yukon']
#Sorting the column alphabetically
sorted_columns_ghg =sorted(column_sorting_ghg.columns)
# Set the column names of the DataFrame
new_combined_data = combined_data[sorted_columns_ghg]
row_2019_ghg = new_combined_data.loc[20, :]
#Making sure data types are addressed
row_2019_pop = row_2019_pop.str.replace(',', '').astype(float)
row_2019_ghg = pd.to_numeric(row_2019_ghg, errors='coerce')
#Calculating per capita
per_capita_ghg = row_2019_ghg/row_2019_pop
#print(per_capita_ghg)
# Create a Pandas Series
per_capita_ghg = pd.Series(per_capita_ghg)
# Labeling the plot
plt.title('GHG Emissions Per Capita')
plt.xlabel('Provinces and Territories')
plt.ylabel('Emissions Per Capita')
per_capita_ghg = per_capita_ghg.sort_values(ascending=False)
# Plot a horizontal bar graph
per_capita_ghg.plot(kind='barh')
# Customize the plot
plt.title('Figure 3b. GHG Emissions Per Capita (Sorted)(2019)')
plt.xlabel('Emissions Per Capita')
plt.ylabel('Provinces and Territories')
# Show the plot
plt.show()
In this portion of the code the same was done as before with regards to reading the multiple excel files. Again, to do this, we defined a list of regions based on the last characters of our Excel files and used a for loop, along with an f string, to read these files. We filtered each file by a specific column and row using the loc function, transposed the data frames, and combined them. Here we wanted to filter GHG emissions coming from energy generation so we could use that value as the size of our bubbles in the scatter plot.
import pandas as pd
# Define a list of regions (e.g., Alberta, British Columbia, etc.)
regions = ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"]
# Initialize an empty DataFrame to store the combined data
combined_data = pd.DataFrame()
# Loop through each region
for region in regions:
# Define the file name based on the region
file_name = f"EN_GHG_IPCC_{region}.xlsx"
# Check if the file exists
# Read the Excel file
df = pd.read_excel(file_name, sheet_name='Summary')
# Select rows and columns containing the relevant data
df = df.loc[[3, 6], 'Unnamed: 15':]
# Rename rows
df = df.rename(index={3: 'Year', 6: f'ghg_{region}'})
# Transpose the DataFrame
df = df.transpose()
# Combine data for this region with the combined_data DataFrame
if combined_data.empty:
combined_data = df
else:
combined_data = pd.merge(combined_data, df, on="Year", how="outer")
# Extract row index 0 for energy sector ghg for the year 2000
energy_sec_2000 = combined_data.iloc[0]
energy_sector_2000 = energy_sec_2000[1:].tolist()
energy_sector_2000[7]=503
energy_sector_2000[5]=1487
# Extract row index 10 for energy sector ghg for the year 2010
row_10 = combined_data.iloc[10]
energy_sector_2010 = row_10[1:].tolist()
# Extract row index 19 for energy sector ghg for the year 2019
energy_sec_2019 = combined_data.iloc[19]
energy_sector_2019 = energy_sec_2019[1:].tolist()
This code segment is responsible for creating three separate dataframes for the years 2000, 2010, and 2019, each containing information about various Canadian provinces and territories. The data includes key attributes such as province codes, total greenhouse gas GHG emissions, population, and a "Bubble_Size" parameter related to the energy sectors GHG contributions. These dataframes are structured to enable a comparative analysis across the years.
To distinguish these dataframes by year, a 'Year' column is added to each, specifying the respective year (2000, 2010, or 2019). Afterward, these dataframes are combined into one consolidated dataframe named 'combined_df' using the 'pd.concat' function. The data is then utilized to create a scatter plot using Plotly Express. This scatter plot visualizes the relationship between population, total GHG emissions, and the bubble size(GHG emissions from energy generation) while distinguishing data points by the year to facilitate comparative analysis.
import pandas as pd
import plotly.express as px
# Extract row index 10 to get total GHG emission data for year 2000
year_2000 = combined_data.iloc[0]
total_ghg_2000 = year_2000[1:].tolist()
total_ghg_2000[7]=528
total_ghg_2000[5]=1530
# Extract row index 10 to get total GHG emission data for year 2010
year_2010 = combined_data.iloc[10]
total_ghg_2010 = year_2010[1:].tolist()
# Extract row index 19 to get total GHG emission data for the year 2019
year_2019 = combined_data.iloc[19]
total_ghg_2019 = year_2019[1:].tolist()
#2000 dataframe
pop_2000 = df_pop.loc[0, 'Alberta':'Yukon'].tolist()
data = {
'Province': ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"],
'Total_GHG_Emissions': total_ghg_2000,
'Population': pop_2000,
'Bubble_Size': energy_sector_2000
}
#2010 dataframe
df_2000 = pd.DataFrame(data)
pop_2010 = df_pop.loc[9, 'Alberta':'Yukon'].tolist()
data = {
'Province': ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"],
'Total_GHG_Emissions': total_ghg_2010,
'Population': pop_2010,
'Bubble_Size': energy_sector_2010
}
df_2010 = pd.DataFrame(data)
#2019 dataframe
pop_2019 = df_pop.loc[19, 'Alberta':'Yukon'].tolist()
data = {
'Province': ["AB", "BC", "MB", "NB", "NL", "NT", "NS", "NU", "ON", "PE", "QC", "SK", "YT"],
'Total_GHG_Emissions': total_ghg_2019,
'Population': pop_2019,
'Bubble_Size': energy_sector_2019
}
df_2019 = pd.DataFrame(data)
# Add a column to distinguish the data frames
df_2010['Year'] = '2010'
df_2000['Year'] = '2000'
df_2019['Year'] = '2019'
# Combine the two data frames
combined_df = pd.concat([df_2019, df_2010,df_2000], ignore_index=True)
# Create a scatter plot using Plotly Express
fig = px.scatter(
combined_df,
x='Population',
y='Total_GHG_Emissions',
size='Bubble_Size',
color='Year',
text ='Province',
hover_name='Province',
size_max=50
)
fig.update_layout(
xaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='lightgray'),
plot_bgcolor='white', #
)
# Set axis titles
fig.update_xaxes(title_text='Population')
fig.update_xaxes(type='linear')
fig.update_yaxes(title_text='Total GHG Emissions (kt CO_2 eq)')
# Set the title
fig.update_layout(title='Figure 4b. Comparison of GHG Emissions and Population by Province/Territory (2000 vs. 2010 vs. 2019)')
# Enable zoom and pan options
fig.update_xaxes(automargin=True)
fig.update_yaxes(automargin=True)
# Show the plot
fig.show()
To further analyze per capita GHG emissions, the below code builds upon the above analyses with the goal of visualizing annual provincial/territorial per capita GHG emissions for all study years. The dataframes containing the total GHG emissions and the population data are re-frmatted and combined with a shapefile that contains provincial/territorial geometries. The resulting geo-dataframe is visualized through an interactive choropleth within the plotly environment.
#### The below code re-runs the intial data import and manipulation code to re-create the original GHG dataframes ##
# Define a list of Provinces and Territories
regions = ["AB", "BC", "MB","NB","NL","NS","NT","NU","ON","PE","QC","SK","YT"]
# Initialize an empty DataFrame to store the combined data
combined_data_copy = pd.DataFrame()
# Loop through each Province and Territory excel file and select the total GHG column
for region in regions:
# Define the file name based on the region code from above
file_name = f"EN_GHG_IPCC_{region}.xlsx"
# Read the Excel file, and select the Summary sheet
df = pd.read_excel(file_name, sheet_name='Summary')
# Select rows and columns containing total GHG
df = df.loc[[3, 5], 'Unnamed: 15':]
# Rename rows
df = df.rename(index={3: 'Year', 5: f'ghg_{region}'})
# Transpose the DataFrame
df = df.transpose()
# Combine data for this region with the combined_data DataFrame
if combined_data_copy.empty:
combined_data_copy = df
else:
combined_data_copy = pd.merge(combined_data_copy, df, on="Year", how="outer")
# Define a list of appropriate column names
column_names = ['Year','Alberta',"British Columbia","Manitoba","New Brunswick","Newfoundland & Labrador","Nova Scotia","Northwest Territories","Nunavut","Ontario", "PE Island","Quebec","Saskatchewan","Yukon"]
# Set the column names of the DataFrame
combined_data_copy.columns = column_names
#Ammend the original population dataframes
#Reformat df_pops dataframe to incclude year - specify new dataframe name "df_pop_wUear"
sorted_columns.append('Year')
df_pops_wYear = df_pop[sorted_columns]
column_names_pop_wYear = ["Alberta", "British Columbia", "Manitoba", "New Brunswick", "Newfoundland & Labrador","Northwest Territories", "Nova Scotia", "Nunavut", "Ontario", "PE Island", "Quebec", "Saskatchewan", "Yukon", "Year"]
df_pops_wYear.columns = column_names_pop_wYear
#Melt regional GHG dataframe into 'long' format - required for mapping function
regionGHG_long = pd.melt(combined_data_copy, id_vars=['Year'], var_name='Province')
regionGHG_long.rename(columns={'value': 'GHG'}, inplace=True)
regionGHG_long['GHG'].fillna(0, inplace=True)
regionGHG_long[['Year','GHG']] = regionGHG_long[['Year','GHG']].astype(int)
regionGHG_long['Province'] = regionGHG_long['Province'].astype(str)
#Similarly melt the population dataframe
regionPop_long = pd.melt(df_pops_wYear, id_vars=['Year'], var_name='Province')
regionPop_long.rename(columns={'value': 'Population'}, inplace=True)
regionPop_long.replace(',','', regex=True, inplace=True)
regionPop_long[['Year','Population']] = regionPop_long[['Year','Population']].astype(int)
regionPop_long['Province'] = regionPop_long['Province'].astype(str)
#Merge the population and GHG dataframes and create a per-capita variable - multiply by 1000 to simplify units
regionPerCap_long = regionPop_long.merge(regionGHG_long, on=['Year','Province'])
regionPerCap_long['ghg_percapita'] = regionPerCap_long['GHG']/regionPerCap_long['Population'] *1000
#Import shapefile of provincial/territorial geometries
#Format the dataframe to mesh with the per-capita dataframe - specify function to remove unwanted characters from province column
import geopandas as gpd
geo_df = gpd.read_file("C:/Users/hurdg/OneDrive/Documents/MDSA/DATA 601/Data 601 - Project/Datasets/georef-canada-province-millesime.shp")
geo_df.rename(columns={'prov_name_e':'Province'}, inplace=True)
geo_df = geo_df.loc[:, ['Province', 'geometry']]
def remove_unwanted(x):
x = x.replace("]", "")
x = x.replace("[", "")
return x.translate(str.maketrans({"'":None}))
geo_df['Province'] = geo_df['Province'].apply(remove_unwanted)
#Merge per-capita dataframe with geometries
geo_df = geo_df.merge(regionPerCap_long, on = 'Province')
#Create json using 2010 data (arbitrary) to act as geospatial reference for choropleth
gdf_2010 = geo_df[geo_df['Year'] == 2010]
geo_df['Year'] = geo_df['Year'].astype(str)
json_2010 = gdf_2010.to_json()
import json
json_2010 = json.loads(json_2010)
#Create choropleth within plotly
fig = px.choropleth(
geo_df,
locations="Province",
featureidkey="properties.Province", ### ! changed !
geojson=json_2010,
color=geo_df.ghg_percapita.astype(float),
hover_name="Province",
range_color=(0, 100),
color_continuous_scale='YlOrBr',
animation_frame=geo_df.Year.astype(str)
)
#Format slider bar label and font
sliders = [dict(
currentvalue={"prefix": "<b>Year </b>"},
font=dict(size=20,color="Black"))]
fig.update_layout(sliders=sliders)
#Specify figure title
fig.update_layout(
title_text = 'Figure 5b.Per Capita GHG Emissions',
#Specify colorbar to appear on left side of map
coloraxis_colorbar_x=-0.1)
#Snap initial figure frame around map
fig.update_geos(fitbounds="locations", visible=False)
fig.show()